- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources2
- Resource Type
-
0002000000000000
- More
- Availability
-
20
- Author / Contributor
- Filter by Author / Creator
-
-
Phanishayee, A (2)
-
Venkataraman, S (2)
-
Akella, A (1)
-
Balasubramanian, A (1)
-
Chawla, S (1)
-
Devanur, N (1)
-
Mahajan, K (1)
-
Singhvi, A (1)
-
Stoica, I (1)
-
Thelin, J (1)
-
Wang, G (1)
-
#Tyler Phillips, Kenneth E. (0)
-
#Willis, Ciara (0)
-
& Abreu-Ramos, E. D. (0)
-
& Abramson, C. I. (0)
-
& Abreu-Ramos, E. D. (0)
-
& Adams, S.G. (0)
-
& Ahmed, K. (0)
-
& Ahmed, Khadija. (0)
-
& Aina, D.K. Jr. (0)
-
- Filter by Editor
-
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
(submitted - in Review for IEEE ICASSP-2024) (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Model parameter synchronization across GPUs introduces high overheads for data-parallel training at scale. Existing parameter synchronization protocols cannot effectively leverage available network resources in the face of ever increasing hardware heterogeneity. To address this issue, we propose Blink, a collective communication library that dynamically generates optimal communication primitives by packing spanning trees. We propose techniques to minimize the number of trees generated and extend Blink to leverage heterogeneous communication channels for hybrid, and faster, data transfers. Evaluations show that compared to the state-of-the-art (NCCL), Blink can achieve up to 8× faster model synchronization (AllReduce), and reduce end-to-end DNN training time for image classification tasks by up to 40%.more » « less
-
Mahajan, K; Balasubramanian, A; Singhvi, A; Venkataraman, S; Akella, A; Phanishayee, A; Chawla, S (, 17th USENIX Symposium on Networked Systems Design and Implementation)Modern distributed machine learning (ML) training workloads benefit significantly from leveraging GPUs. However, significant contention ensues when multiple such workloads are run atop a shared cluster of GPUs. A key question is how to fairly apportion GPUs across workloads. We find that established cluster scheduling disciplines are a poor fit because of ML workloads' unique attributes: ML jobs have long-running tasks that need to be gang-scheduled, and their performance is sensitive to tasks' relative placement. We propose Themis, a new scheduling framework for ML training workloads. It's GPU allocation policy enforces that ML workloads complete in a finish-time fair manner, a new notion we introduce. To capture placement sensitivity and ensure efficiency, Themis uses a two-level scheduling architecture where ML workloads bid on available resources that are offered in an auction run by a central arbiter. Our auction design allocates GPUs to winning bids by trading off fairness for efficiency in the short term, but ensuring finish-time fairness in the long term. Our evaluation on a production trace shows that Themis can improve fairness by more than 2.25X and is ~5% to 250% more cluster efficient in comparison to state-of-the-art schedulers.more » « less
An official website of the United States government

Full Text Available